Additional Parameters for CSV, JSON, and Parquet file formats
When configuring CSV, JSON, or Parquet file formats apart from parameters like field delimiter and parse header, you can specify additional parameters such as ignoreLeadingWhiteSpace or multiLine and other parsing options. These parameters control how the file is read and interpreted during data ingestion or processing. Additional parameters ensure that the data is correctly structured and aligned with your schema before ingestion.
For example, when you use Field Delimiter and select the correct delimiter, it ensures that the data integration tool correctly splits the data into columns during ingestion. When you use Parse Header, it specifies whether the first line of the CSV file should be treated as a header row containing column names. Setting this correctly ensures that header values are not loaded as data.
CSV
Using additional parameters for CSV file format ensures:
-
Data is parsed correctly.
-
Schema mismatches or incorrect column alignment is prevented.
-
Support is provided for data from multiple sources with different formatting conventions.
The following additional parameters are supported for CSV file format.
| Parameter | Description |
| comment | Specifies the character that identifies comment lines in the file. Any line beginning with this character is ignored during processing. Example: Setting # treats all lines starting with # as comments. |
| date_format | Defines the format of date values in the data files (data loading) or table (data unloading). |
| emptyValue | Sets the string representation of an empty value. |
| encoding | Defines the character encoding used to read the file. Use UTF-8 for most datasets, especially those containing special or non-English characters. |
| escape | Character used to escape special symbols within a field, such as a quote or delimiter. Example: Using \ allows "New \"York\"" to be read correctly. |
| ignoreLeadingWhiteSpace | When enabled, removes extra white spaces at the beginning of each field value. Helps ensure consistent parsing of fields with irregular spacing. |
| ignoreTrailingWhiteSpace | When enabled, removes white spaces at the end of each field value. Useful for avoiding mismatches caused by unintended spaces. |
| multiLine | Allows fields to span multiple lines if enclosed in quotes. Enable this option when your data contains line breaks within text fields. |
| nullValue | Defines the string that should be interpreted as a null or missing value. Example: Setting null or NA ensures those tokens are treated as empty values. |
| quote | Character used to enclose values containing delimiters, quotes, or line breaks. Commonly set to ". Example: "New York, USA" is treated as one value. |
| time_format | Defines the format of time values in the data files (data loading) or table (data unloading). |
| empty_field_as_null |
When loading data, specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters (For example, ,,). When unloading data, this option is used in combination with FIELD_OPTIONALLY_ENCLOSED_BY. |
| field_delimiter | One or more singlebyte or multibyte characters that separate fields in an input file (data loading) or unloaded file (data unloading). |
| parse_header | Boolean that specifies whether to use the first row headers in the data files to determine column names. |
| record_delimiter | One or more singlebyte or multibyte characters that separate records in an input file (data loading) or unloaded file (data unloading). |
| trim_space | Boolean that specifies whether to remove white space from fields. |
JSON
Using additional parameters for JSON file format ensures:
-
Precise schema discovery during ingestion.
-
Consistent rule application in profiling, validation, and transformation.
-
Reduction in errors caused by incorrect file interpretation (e.g., misaligned columns, missing headers).
The following additional parameters are supported for JSON file format.
| Parameter | Description |
| allowSingleQuotes | |
| allowUnquotedFieldNames | |
| date_format | Defines the format of date values in the data files (data loading) or table (data unloading). |
| time_format | Defines the format of time values in the data files (data loading) or table (data unloading). |
| multi_line | Boolean that specifies whether multiple lines are allowed. |
| strip_outer_array | Boolean that instructs the JSON parser to remove outer brackets (i.e. [ ]). |
| null_if | String used to convert to and from SQL NULL. |
| encoding |
For reading, allows to forcibly set one of standard basic or extended encoding for the JSON files. For writing, specifies encoding (charset) of saved json files. JSON built-in functions ignore this option. |
| trim_space | Boolean that specifies whether to remove leading and trailing white space from strings. |
PARQUET
Using additional parameters for PARQUET file format ensures:
-
Improved schema and data type handling.
-
Fine-grained control over data ingestion.
-
Better compatibility over external systems.
The following additional parameters are supported for PARQUET file format.
| Parameter | Description |
| Binary_as_text | Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text. When set to FALSE, Snowflake interprets these columns as binary data. |
| Trim_space | Boolean that specifies whether to remove leading and trailing white space from strings. |
| Use_vectorized_scanner | Boolean that specifies whether to use a vectorized scanner for loading Parquet files. |
| What's next? Ingesting Data from Amazon S3 into a Snowflake Data Lake |